Preparations

Load the necessary libraries

library(car)       #for regression diagnostics
library(broom)     #for tidy output
library(ggfortify) #for model diagnostics
library(sjPlot)    #for outputs
library(knitr)     #for kable
library(effects)   #for partial effects plots
library(emmeans)   #for estimating marginal means
library(ggeffects) #for plotting marginal means
library(MASS)      #for glm.nb
library(MuMIn)     #for AICc
library(tidyverse) #for data wrangling
library(modelr)    #for auxillary modelling functions
library(performance) #for residuals diagnostics
library(see)         #for plotting residuals
library(DHARMa)    #for residual diagnostics plots
library(patchwork) #grid of plots
library(scales)    #for more scales
theme_set(theme_classic())

Scenario

Here is a modified example from Quinn and Keough (2002). Day and Quinn (1989) described an experiment that examined how rock surface type affected the recruitment of barnacles to a rocky shore. The experiment had a single factor, surface type, with 4 treatments or levels: algal species 1 (ALG1), algal species 2 (ALG2), naturally bare surfaces (NB) and artificially scraped bare surfaces (S). There were 5 replicate plots for each surface type and the response (dependent) variable was the number of newly recruited barnacles on each plot after 4 weeks.

Six-plated barnacle

Format of day.csv data files

treat barnacle
ALG1 27
.. ..
ALG2 24
.. ..
NB 9
.. ..
S 12
.. ..
treat Categorical listing of surface types. ALG1 = algal species 1, ALG2 = algal species 2, NB = naturally bare surface, S = scraped bare surface.
barnacle The number of newly recruited barnacles on each plot after 4 weeks.

Read in the data

As we are going to treat Treatment as a categorical predictor, we will specifically declare it as such straight after importing the data.

day = read_csv('../data/day.csv', trim_ws=TRUE)
glimpse(day)
## Rows: 20
## Columns: 2
## $ TREAT    <chr> "ALG1", "ALG1", "ALG1", "ALG1", "ALG1", "ALG2", "ALG2", "ALG…
## $ BARNACLE <dbl> 27, 19, 18, 23, 25, 24, 33, 27, 26, 32, 9, 13, 17, 14, 22, 1…
day <- day %>% janitor::clean_names() %>%
  mutate(treat = fct_relevel(treat, c("NB", "S", "ALG1", "ALG2")))

Exploratory data analysis

Model formula: \[ y_i \sim{} \mathcal{Pois}(\lambda_i)\\ \mu_i = \boldsymbol{\beta} \bf{X_i} \]

where \(\boldsymbol{\beta}\) is a vector of effects parameters and \(\bf{X}\) is a model matrix representing the intercept and treatment contrasts for the effects of Treatment on barnacle recruitment.

ggplot(day, aes(y=barnacle, x=treat)) +
    geom_boxplot()+
    geom_point(color='red')

ggplot(day, aes(y=barnacle, x=treat)) +
    geom_violin()+
    geom_point(color='red')

Fit the model

day_mod <- glm(barnacle ~ treat, data = day, family = poisson(link = "log"))

Model validation

autoplot(day_mod, which = 1:6)

DHARMa::simulateResiduals(day_mod, plot=T)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.8519155 0.2442125 0.1780693 0.5192783 0.7018186 0.2069486 0.845717 0.4237525 0.3734334 0.797562 0.05894113 0.3008211 0.7262854 0.3287856 0.9518321 0.4402874 0.06980402 0.6823056 0.9620096 0.2676171 ...

All looks good!

Cook’s d is not needed if you only have categorical predictors

Partial plots

plot_model(day_mod, type = 'eff')
## $treat

Model investigation / hypothesis testing

summary(day_mod)
## 
## Call:
## glm(formula = barnacle ~ treat, family = poisson(link = "log"), 
##     data = day)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6748  -0.6522  -0.2630   0.5699   1.7380  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   2.7081     0.1155  23.452  < 2e-16 ***
## treatS       -0.1278     0.1688  -0.757   0.4488    
## treatALG1     0.4010     0.1492   2.688   0.0072 ** 
## treatALG2     0.6383     0.1427   4.472 7.75e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 54.123  on 19  degrees of freedom
## Residual deviance: 17.214  on 16  degrees of freedom
## AIC: 120.34
## 
## Number of Fisher Scoring iterations: 4
exp(2.58)
## [1] 13.19714

exp(2.58) = 13.2 is the number of barnacles in the

Predictions

Post-hoc test (Tukey’s)

Planned contrasts

Define your own

Compare:

  1. ALG1 vs ALG2
  2. NB vs S
  3. average of ALG1+ALG2 vs NB+S

Summary figures

References

Quinn, G. P., and K. J. Keough. 2002. Experimental Design and Data Analysis for Biologists. London: Cambridge University Press.